yehun AA', Sabejeje TA*, Bayo-Lebi D', Olawuyi NJ', Ayinla NJ', Ogunwale YE? 


1. Department of Computer Science, Adeyemi College of Education, Ondo, Nigeria 
2. Department of Biology, Adeyemi College of Education, Ondo, Nigeria 
3. Osun State College of Education, llesa, Osun State, Nigeria 


Received: 5 March 2017 
Accepted: 2 April 2017 
Published: 1 May 2017 


Aroyehun AA, Sabejeje TA, Bayo-Lebi D, Olawuyi NJ, Ayinla NJ, Ogunwale YE. Fuzzy logic based predictive model for likelihood of 
water related disease. Discovery, 2017, 53(257), 321-333 


ee _ O_| 
© The Author(s) 2017. Open Access. This article is licensed under a Creative Commons Attribution License 4.0 (CC BY 4.0). 


7. 
%? Article is recommended to print as color digital version in recycled paper. 


A fuzzy logic-based system has been applied to a number of cases in medicine especially in the area of the development of 
diagnostic systems and has been discovered to produce accurate results. In this paper, a fuzzy logic-based system is presented 
which is used to simulate a predictive model for predicting the likelihood of water related disease (malaria). Knowledge was elicited 
from an expert at Medical Centre, Osogbo, Osun State, Nigeria and was used in developing the rule-base and simulated the 
prediction model using the MATLAB software. The results of the fuzzification and defuzzification of variables, inference engine 
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definition and model testing was also presented and showed that the fuzzy logic based model is very useful in the prediction of the 
likelihood of water related disease (malaria) in South Western Nigerian. 


Keywords: fuzzy logic, prediction model, water related disease, likelihood 


1. INTRODUCTION 


According to Lenntech, (2014) Water borne diseases are diseases that are caused as a result of contaminated water. Water-borne 


diseases are any illness caused by drinking water contaminated by human or animal faeces, which contain pathogenic 
microorganisms (Lenntech, 2014). Water borne diseases spread by contamination of drinking water systems with the urine and 
faeces of infected animal or people.. In Nigeria, contaminations of drinking water with pathogens have also been reported in several 
towns (Bai et al., 2007). 

Water-borne outbreaks of enteric diseases have occurred either when public drinking water supplies were not adequately 
treated after contamination with surface water or when surface waters contaminated with enteric pathogens have been used for 
recreational purpose Viohnson et al., 2003). Today only 58% of Nigerians have access to safe water (UNICEF and WHO, 2012). Thus, 
most households have to resort to drinking water from wells and streams especially in the rural and sub-urban communities. These 
water sources are largely untreated and might harbor water-borne and vector-borne diseases such as cholera, typhoid fever, 
diarrhea, hepatitis and guinea worm (Rahman et al., 2001; Adekunle, 2004; Fenwick, 2006). 

In developing countries, particularly in Nigeria, the two main water problems man contends with are the quantity and 
quality of water (Adeniyi, 2004; Olajuyigbe, 2010). In view of its occurrence and distribution pattern, water is not easily 
available to man in the desirable amount and quality. These factors have led to the growing rate of waterborne diseases 
like typhoid fever and cholera experienced in this part of the world (Edwards, 1993). 

Although water-related diseases have largely been eliminated in wealthier nations, they remain a major concern in much of the 
developing world. The World Health Organization estimated in the 2000 assessment states that there are four billion cases of 
malaria each year in addition to millions of other cases of illness associated with the lack of access to clean water. Since many 
illnesses are undiagnosed and unreported, the true extent of these diseases is unknown. Water-related diseases are typically placed 
in four classes: waterborne, water-washed, water-based, and water-related insect vectors. The first three are most clearly associated 
with lack of improved domestic water supply. 

Fuzzy logic is a means of providing a path for the diagnosis and decision making process due to its ability to deal with 
uncertainties (fuzziness) and ambiguity which may exist in the knowledge and information relating to a domain of study. Today, 
medical practitioners have identified possible and promising areas for implementing fuzzy logic systems for medical diagnosis 
Mishra et al, 2014). The idea of Fuzzy logic was presented by Lofti A Zadehn in 1965 based on the fuzzy set theory. Fuzzy logic 
systems are implemented by the manipulation of membership functions which simulate variables by the inference engine (rule- 
base). 

Membership functions (MF) are curves that defines how each point in the input and output space is mapped to a membership 
value (or degree of membership) between 0 and 1.This implies that for every label of each variable; a membership function will be 
used to define the level of membership of the value entered with respect to the degree of membership to the label. Unlike, classical 
set; a fuzzy logic may be defined as follows: 

If X is a universe of discourse and its elements are denoted by x, then a fuzzy set A in X is defined as a set of ordered pair: 


A= {x, Ua) x eX} (1) 


La (X)is called a membership function (or MF) of x in A. The membership function maps each element of X to a membership 


function value between 0 and 1. For the purpose of this study, the following must be noted: 


i. The set A is any input (sickle-cell factors) or output (sickle-cell likelinood) variable considered for this study; 
ii | The set X is the set of values for which a variable is valid, for example a set A = degree of Anemia will be valid for value x=0 
for No and x=1 for Yes. Hence for the set A, the set X is the set containing {0, 1}; and 
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iii, pay (X)is the map of the membership function that will be used to plot the degree of membership. 


Furthermore, there is no widely acceptable and readily available cure for patients with sickle cell anemia at present. Curable 
methods such as gene therapy and bone marrow transplantation, which may be associated with several complications, are not 
readily available in developing nations (Omoti, 2005). This disease is a serious threat to human life and it is believed that such 
tragedy can be reduced by early diagnosis of its existence, hence this study. 

This paper is aimed at developing a fuzzy logic based system that predicts the likelihood of sickle cell disease in an individual by 
requesting for a 3-tuple record consisting of the Level of fetal hemoglobin, Genotype and the degree of Anemia. The study is 
limited to knowledge elicited from a physician located in western Nigeria based on experience gathered in the diagnosis of the 
likelihood of sickle cell disease in patients in western Nigeria. 

This paper is aimed at developing a fuzzy logic based system that predicts the likelihood of water Related disease in South 
Western Nigeria. The study is limited to knowledge elicited from a physician located in western Nigeria based on experience 
gathered in the diagnosis of the likelihood of water Related disease in patients in western Nigeria. 


2. RELATED WORKS 


Craun et al. (2006) worked on waterborne outbreaks reported in the United States and discussed that the epidemic waterborne risks 
in the paper. Despite the fact that the true incidence of waterborne illness was not reflected in the then reported outbreak statistics, 
outbreak surveillance has provided information about the important waterborne pathogens, relative degrees of risk associated with 
water sources and treatment processes, and adequacy of regulations. Pathogens and water system deficiencies that are identified in 
outbreaks may also be important causes of endemic waterborne illness. In recent years, investigators have identified a large number 
of pathogens responsible for outbreaks, and research has focused on their sources, resistance to water disinfection, and removal 
from drinking water. Outbreaks in surface water systems have decreased in the recent decade, most likely due to recent regulations 
and improved treatment efficacy. Of increased importance, however, are outbreaks caused by the microbial contamination of water 
distribution systems. A suggestion was made that, in order to better estimate waterborne risks in the United States, additional 
information is needed about the contribution of distribution system contaminants to endemic waterborne risks and undetected 
waterborne outbreaks, especially those associated with distribution system contaminants 

Idowu (2012) Worked on development of a web based geo-spatial environmental health tracking system for Southwestern 
Nigeria studied and assessed the problem of environmental health, developed a spatial environmental health data and predictive 
models to forecast the likelihood of environmental health related diseases. This was with a view to prototyping the models for 
environmental health tracking. Data were collected from purposively twenty four local government areas within Southwestern 
Nigeria comprising of four local government areas from each of the six states. Observation and personal interview (both structured 
and unstructured) were also used to identify and assess environmental health problems within Southwestern Nigeria. The design of 
a spatial environmental health data model was done using the unified modelling language (UML). The model to predict the 
likelihood of environmental related diseases based on environmental health problems was formulated using the MATLAB Fuzzy 
Logic Toolbox. The prototype was developed using MySQL and PhP codes. Data collected from the local government areas was used 
to validate the performance of the model. The result showed that, when general sanitation, water, toilet facility and refuse disposal 
facility had probability of 0.000, the probability that environmental related diseases could occur was 0.870. If general sanitation, 
water, toilet facility and refuse disposal facility had probability of 0.500, then the probability that environmental related diseases 
could occur was 0.581. Also, if general sanitation, water, toilet facility and refuse disposal facility had probability of 1.00 , then the 
probability that environmental related diseases could occur was 0.130. In addition, the performance assessment of the 
environmental health tracking system was done on three occasions and the average value for the three occasions was recorded. The 
system was accessed for 4 different xv mobile broadband networks at the radius of 100m away from their base stations. It was 
observed that on the average for the 4 mobile broadband networks, the response time were 2.60, 2.60, 3.00 and 3.00 seconds 
respectively. On the average, the response time to access the system in any mobile broadband network in Nigeria is 2.80 seconds. In 
conclusion, the environmental health tracking system allows real time tracking of environmental health problem with the ability to 
forecast the possibility of environmental health related diseases within the study area. 

Hopkins (1984) Worked on waterborne Disease in Colorado: Three Years’ Surveillance and 18 Outbreaks observed that there has 
been a steady increase in the number of reported waterborne disease outbreaks occurring in the United States since the early 1960s. 
The average annual number reported for 1971-72 was 32; from 1980-82 the average was 41.1.1 The actual number of outbreaks that 
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occur is certainly greater than this. Over a three-year period prior to June 1980, six waterborne outbreaks were reported in Colorado 
but about 20 additional clusters of gastrointestinal illness was suspected to be waterborne. Energy development, tourism, and 
population growth combine to place stress on water systems that typically have relied on protected or remote surface sources of 
supply with marginal treatment. Prior to June 1980, the Colorado Department of Health had a passive waterborne outbreak 
surveillance system, and responsibility for follow-up of water-related complaints was divided among two Sections of the Water 
Quality Control Division and the communicable Disease Control Section. From June 1980 through June 1983, a field epidemiologist 
was made responsible for improving the detection of waterborne illness. The project was housed in the Communicable Disease 
Control Section, Division of Disease Control and Epidemiology, which receives reports of communicable disease cases from the 
state's 63 counties. It was discovered that the Colorado Department of Health conducted intensive surveillance for waterborne 
diseases during the three-year period July 1, 1980-June 30, 1983. Eighteen outbreaks of waterborne illness were investigated. 
Outbreaks involved from 15 to 1,500 ill persons. Giardia lamblia was confirmed or suspected as the agent in nine outbreaks, 
rotavirus in one, and no agent could be identified in eight. Seventeen outbreaks occurred on surface-water systems; none of these 
had adequate chemical pretreatment and filtration. 


3. MATERIALS AND METHODS 


3.1 Research design 

In this paper, a fuzzy logic-based prediction model is proposed with the aim of predicting the likelihood of water related disease in 
south western Nigerian. The study started with the identification of the problem of predicting water related disease likelihood given 
a number of symptoms/factors considered as input variables (3 in all). A review of related literature was performed to identify 
understand water related disease and its symptoms in addition to related works done in the past. Following this, knowledge was 
elicited from an expert (medical practitioner) located at the primary health Centre, Osogbo, Ondo State in understanding and 
verifying the information concerning water related disease symptoms. 

The elicited knowledge was used to build the inference engine of the proposed system — this is part of the model formulation 
technique which also includes the fuzzification of the input and output variables. the model formulation is made complete by the 
identification of the aggregation method chosen for the inference engine alongside the defuzzification method required for 
producing the output variable which is the likelihood of water related disease (No and Yes). 


3.2. Data identification and collection 

A number of symptoms/factors are known to be connected to likelihood of malaria disease, among all these factors only 3 were 
identified as being the most important and relevant symptoms: the level of fetal water borne disease. This information was collected 
via structured interview with the medical practitioner who identified the factors and emphasized 3 main factors which are most 
easily used in identifying the likelihood of malaria disease based on his experience in medical practice. The water borne disease 
likelihood is defined as either: 0%, between 50% and 100%, and greater than 50%; the malaria was classified as either No, Fair and 
Yes while the degree of likelihood of malaria is classified as either less than 50% and greater than or equal to 50%. 

In addition to the identification of the data variables, an understanding of the pattern of distribution was important in identifying 
the best membership function that could be used in plotting the labels of each variables. The number of rules required by the fuzzy 
logic inference system was calculated by multiplying the labels of each variable with each other; therefore we have 3*3*2 = 18 
different rules. This information was necessary in the development of the fuzzy logic inference system. 


3.3. Fuzzy logic model formulation 
Fuzzy logic systems have the ability to decide and control a system using the knowledge of an expert. Fuzzy logic systems are 
mostly profitable in systems with sophisticated environments where a clear and obvious model of the system is not achievable. In 
order to develop the fuzzy logic system required for the prediction of likelihood of malaria disease, a number of activities are 
needed to be accomplished. The Fuzzy Logic System available in the Fuzzy Logic Toolbox of the MATLAB R2012a software has three 
parts: 

e = Aset of Inputs represented by their respective membership functions; 

e An Inference Engine which contains the IF-THEN rules (domain knowledge); and 

e An Output represented by its membership functions. 
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The membership functions will be used to map the values of each input and output variables into a [0, 1] interval with the use of 
triangular and trapezoidal membership functions (where appropriate); this process is referred to as a Fuzzification process. After 
Fuzzification; the fuzzified inputs must be mapped to the fuzzified output via the use of operators (AND, OR and NOT) to develop 
IF-THEN rules that describe the relationship between every input (water related likelihood factors) and output (likelihood of the 
disease) variable. The different rules are used to generate different results which are then aggregated to just one fuzzified output. 
This fuzzified output will then be defuzzified using the centroid method which selects the centre of the polygon to determine the 
label of the output variable as Yes, Fair or No. 


The most prominent reasons that justify the use of fuzzy logic systems today are: 
e The sophistication of the natural world which leads to an approximate description or a fuzzy system for modeling; and 


e The necessity of providing a pattern to formulate mankind knowledge and applying it to actual systems. 


The process of development of the fuzzy inference system needed for the prediction of water related disease may be summarized 


as follows: 
° Fuzzification of inputs and outputs; 
e Construction of the inference engine; 
e Rule aggregation; and 
° Defuzzification of output variables. 


3.4. Defining membership functions 

Before the process of Fuzzification, it is very important to properly describe the crisp values that was used in mapping the values of 
the membership function which was needed by the fuzzy logic system. For the discrete variables with nominal values or Boolean 
(yes/no) — the values: 0, 1, 2...... n-1 was assigned to each value for n labels; this is the case for malaria as NO=0, Fair=0.5 and Yes=1. 
For the continuous variables which are measured; a value of the percentage expressed as a proportion of 0, 0.5 and 1 was used, i.e. 
0%, 50% and 100% respectively into the appropriate membership functions. 


3.5. Fuzzification of the variables 

For the purpose of this study, the triangular and trapezoidal membership functions were used to map the degree of membership of 
the labels of each variable used both input and output variable. Following is a description of each variable and the type of 
membership function used for the labels alongside the ordered pair that was used in mapping the degree of membership for each 
variable’s label. 


a. The malaria outbreak prediction model 
Presence of Bush in the environment = (No [-0.4 0 0.4], Fair [0.1 0.5 0.9], Yes [0.6 1 1.4]) 


0, x <—-0.4 


404 Qa cy <q 
PresenceOf Bush(No; —0.4 00.4) = ¢ ,°* 2) 


0.4- 
~ 0<x<04 
0.4 


0, 04<x 


0,x <0.1 


eer 01<x<05 
PresenceOf Bush(Fair; 0.1 0.5 0.9) = 4 ya" (3) 
aia ,05<x<09 


0,0.9<x 


0, x < 0.6 


= 06 ene 
PresenceOf Bush(Yes; 0.6 1 1.4) = ie (4) 
- ,06<sx<14 


0,14<x 
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Figure 1 Membership function for the presence of bushes in the environment 


Presence of stagnant water = (No [-0.4 0 0.5], Yes [0.5 1 1.5]) 
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Figure 2 Membership function for presence of stagnant water 


Use of mosquito nets = (Yes [-0.4, 0, 0.4], Fair [0.1 0.5 0.9], No [0.6 1 1.4]) 
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Figure 3 Membership function for use of mosquito nets 


Likelihood of malaria outbreak = (No [-0.4 0 0.3], Probably [0.3 0.5 0.6], High [0.6 1 1.4]) 
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Figure 4 Membership function for likelihood of malaria outbreak 


3.6. Inference engine development, aggregation and defuzzification 

After developing the membership function, the process of developing the fuzzy inference engine which makes use of the 27 
different rules shown in Table 2 below is necessary. It is with the information about the membership functions i.e. the labels that 
have been used to map each interval of membership functions e.g. malaria had labels: yes, fair and No that the rules were been 
formed. 

Hence, each rules that was provided is a result of a case-based reasoning approach which involves the experience that the expert 
had had in the years have shown such pattern except otherwise cases where there were misdiagnosis (false positives) or 
undiagnosed (cases not yet understood). 

For the purpose of this study and the variables that are considered — the And Method used in evaluating each degree of 
membership is Minimum (it selects the smallest value of many), the Or Method used is the Maximum (it selects the largest value out 
of many); which although is not used in this study and the Implication Method used is the Minimum. These fuzzy operators were 
used to calculate the output for each rules which now require aggregation to be applied in order to get a single output. 

The Aggregation method used in determining the optimum output membership function for the output is chosen to be 
Maximum (it selects the largest value for every region of the output variable’s membership function). This method was chosen since 
it is the most commonly used method of aggregating linear-wise membership functions like trapezoidal and triangular membership 
functions. 

The defuzzification of the output membership function resulting from the process of aggregation shows the crisp result that 
gives the likelihood of water related disease as a real number value (a value within the range of the output variable’s membership 
functions). In the case of this study the values 0 0.5 and 1 were used to identify No, probably and Yes respectively. 
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Figure 5 Proposed fuzzy inference system 


Table 1 Rule base for likelihood of malaria outbreak 


Rule No Presence of bushes inthe Presence of stagnant Use of mosquito nets Likelihood of malaria 
environment water outbreak 

1 No No No No 

2 No No Fair No 

3 No No Yes No 

4 No Fair No Probably 

5 No Fair Fair Probably 

6 No Fair Yes No 

7 No Yes No Probably 

8 No Yes Fair Probably 

9 No Yes Yes No 

10 Fair No No Probably 

11 Fair No Fair No 

12 Fair No Yes No 

13 Fair Fair No Probably 

14 Fair Fair Fair Probably 

15 Fair Fair Yes Probably 

16 Fair Yes No High 

17 Fair Yes Fair Probably 

18 Fair Yes Yes High 

19 Yes No No No 
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20 Yes No Fair Probably 
21 Yes No Yes High 
22 Yes Fair No Probably 
23 Yes Fair Fair Probably 
24 Yes Fair Yes No 

25 Yes Yes No High 
26 Yes Yes Fair High 
27 Yes Yes Yes Probably 


The method of defuzzification chosen for this study is the centroid method — it simply calculates the centre-of-gravity of the final 
polygon that results from the process of aggregation. It is also chosen for its compatibility with linear-wise membership functions. 
Figure 5 shows the diagram of the simulated fuzzy logic system for the prediction of the likelihood of Sickle-cell disease in an 
individual given the values for three (3) input variables, namely: Presence of stagnant water, Use of mosquito nets and Presence 
of bushes in the environment. This is the view of the fuzzy inference system using the fuzzy logic toolbox available in the MATLAB 
R2012a software used as the simulation environment. 
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Figure 6 Surface diagram showing use of mosquito nets and presence of bushes 


4. RESULTS AND DISCUSSIONS 


After formulating the model necessary for simulating the fuzzy logic inference system — the model was implemented using the 
MATLAB Versions 7 software developed as Release 2012. The fuzzy logic toolbox which is among the many toolboxes available in 
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the MATLAB software was used in simulating the predictive model using triangular and trapezoidal membership functions for the 
fuzzification of the input and output variables. The fuzzy logic system was used to perform a view of the surface diagram which 
shows the distribution of the many possible values and the relationship between any two variables. Figures 6-7 gives a plot of the 
surface diagram showing the relationship between level of stagnant water and mosquito nets; it can be observed that the diagram 


clearly shows that there is more likelihood of cases of malaria. 
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Figure 7 Surface diagram showing use of mosquito nets and presence of stagnant water 


5. CONCLUSION 
The proposed model for the prediction of the likelihood of water related disease presented using 3 input variables namely: the 
presence of bushes, presence of stagnant water and use of mosquito nets. The variables were identified and knowledge defining the 
relationship between variables was used in developing the inference system of the fuzzy inference system. The variables were all 
fuzzified and the fuzzified input variables were fed to the inference engine. The 27 output that were produced after the inference 
engine are aggregated to a single output which was defuzzified to get the crisp output i.e. No or Yes. 

The model was simulated using the fuzzy logic toolbox available in the MATLAB software and the results of the behavior of the 
proposed model presented via the surface diagram. It is believed that this model will help diagnose the likelihood of water related 
disease having provided a record containing the inputs as a 3-tuple. This model should help reduce the number of untimely deaths 


which occur as a result of spread of the disease. 
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